Run auto-tagging only if no documents found by OCR#918
Open
SCDevel wants to merge 1 commit into
Open
Conversation
This is to help prevent a task-switching overhead that comes from the potentially long startup times of models.
Contributor
📝 WalkthroughWalkthroughThe auto-tagging step in background.go now executes conditionally—only when OCR produces no documents. Previously, auto-tagging always ran after OCR regardless of its results. This optimization prevents redundant processing and improves efficiency by skipping unnecessary auto-tagging operations. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ivanzud
added a commit
to ivanzud/paperless-gpt
that referenced
this pull request
Mar 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ISSUE
Currently if someone AUTO_TAG's and AUTO_OCR_TAG's some documents and then waits a short time (ex. 15 seconds) then TAGS some more documents there is potential that the second batch of documents will not end up being OCR'd until after tagging is complete. This would be fine if LLM startup times were not potentially really slow. (I initially started hosting the Models on a HHD, which could take minutes to startup)
SOLUTION
If OCR has processed documents, skip Tagging and check OCR again. This will recheck for new documents,
Which in theory will keep the OCR Model alive, preventing the slow starting and stopping behavior.
Also this will help with situations where users are mass uploading files with a workflow to set the auto tag's on consumption (or using the folder tags feature).
FLAW
If the document is processed fast enough it could potentially miss this second check.
maybe a delay could help. it would probably still be faster even with a 10-20s delay.
TESTING
I have not tested this, as it is a relatively small feature. I also don't honestly know how the Models are started and stopped so this potentially may not even fundamentally work, but I figured regular maintainers would know that better then me.
Summary by CodeRabbit